# Multimodal Feature Extraction

Openvision Vit Base Patch16 384
Apache-2.0
OpenVision is a fully open, cost-effective family of advanced vision encoders focused on image feature extraction in multimodal learning.
Multimodal Fusion
O
UCSC-VLAA
43
2
Mlcd Vit Bigg Patch14 448
MIT
MLCD-ViT-bigG is an advanced Vision Transformer model enhanced with 2D Rotary Position Encoding (RoPE2D), excelling in document understanding and visual question answering tasks.
Text Recognition
M
DeepGlint-AI
1,517
3
Internvit 300M 448px V2 5
MIT
InternViT-300M-448px-V2_5 is a major upgrade based on InternViT-300M-448px, enhancing visual feature extraction capabilities through ViT incremental learning and NTP loss, particularly excelling in handling multilingual OCR data and complex scenarios like mathematical charts.
Text-to-Image
I
OpenGVLab
23.29k
33
Coin Clip Vit Base Patch32
Apache-2.0
A coin image retrieval model fine-tuned based on CLIP, enhancing feature extraction capabilities for coin images
Image-to-Text Transformers
C
breezedeus
886
4
Eva02 Large Patch14 224.mim M38m
MIT
EVA02 feature/representation model, pretrained on Merged-38M dataset via masked image modeling, suitable for image classification and feature extraction tasks.
Image Classification Transformers
E
timm
571
0
Taiyi CLIP RoBERTa 326M ViT H Chinese
Apache-2.0
The first open-source Chinese CLIP model, pre-trained on 123 million image-text pairs, with RoBERTa-large architecture as the text encoder.
Text-to-Image Transformers Chinese
T
IDEA-CCNL
108
10
Taiyi CLIP Roberta Large 326M Chinese
Apache-2.0
The first open-source Chinese CLIP model, pre-trained on 123 million image-text pairs, supporting Chinese image-text feature extraction and zero-shot classification
Text-to-Image Transformers Chinese
T
IDEA-CCNL
10.37k
39
Taiyi CLIP Roberta 102M Chinese
Apache-2.0
The first open-source Chinese CLIP model, pre-trained on 123 million image-text pairs, with a text encoder based on RoBERTa-base architecture.
Text-to-Image Transformers Chinese
T
IDEA-CCNL
558
51
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase